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Abstract 

We address a version of the set-cover problem where we do not know the sets initially (and hence 
referred to as covert) but we can query an element to find out which sets contain this element as well as 
query a set to know the elements. We want to find a small set-cover using a minimal number of such 
queries. We present a Monte Carlo randomized algorithm that approximates an optimal set-cover of size 
OPT within 0(log N) factor with high probability using 0(OPT- log 2 N) queries where N is the input 
size. 

We apply this technique to the network discovery problem that involves certifying all the edges and 
non-edges of an unknown n-vertices graph based on layered-graph queries from a minimal number of 
vertices. By reducing it to the covert set-cover problem we present an 0(log 2 n)-competitive Monte 
Carlo randomized algorithm for the covert version of network discovery problem. The previously best 
known algorithm H has a competitive ratio of Q(y/nlogn) and therefore our result achieves an expo- 
nential improvement. 

1 Introduction 

Given a ground set S with n' elements and a family of sets S±,S2--- S m > whera3 S{ C S, a cover C is 
a collection of sets from this family whose union is S. It is known that finding a cover consisting of the 
minimum number of sets is a computationally intractable problem |9 ]. There are many strategies |[6l [TTl[T4l 
to approximate the smallest cover within a factor of 0(log n') which is known to be the best possible unless 
P = NPM- 

In this paper, we consider the following version of the set cover problem. Although we know m! , n', 
we do not know the elements nor the cardinality of any of the sets Si. We are allowed to query an element 
e £ S that returns all sets Si that contain e which we refer to as a hitting-set query; we can also query a 
set to know its elements. We would like to compute a small set cover of S using a minimal number of such 
queries. More specifically, if OPT is the minimum size of a set cover for any instance of the problem, we 
would like to find a set cover of size 0(OPT • polylogn') using only 0(OPT • polylogra') queries. Note 

*A Preliminary version of the results have appeared earlier in the 4th Workshop on Algorithms and Computation 2010 
1 We have chosen n' , m' as notations to keep them distinct from graphs with n vertices and m edges. 
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that by using min{m', n'} queries, we can reduce it to the standard version but the number of queries may 
not satisfy 0(OPT • polylogn'). By restricting the number of queries to be close to OPT, an algorithm 
cannot afford to learn the contents of all the sets, yet it is required to find a cover close to the optimal. 

This formalization is also distinct from the online problems addressed in (TJIll where the sets are known 
but the adversary chooses a set of the ground set for which a minimal cover must be computed. An adversary 
chooses the elements one after the other and the online algorithm must maintain a cover of the elements 
revealed upto a given stage. There is no apparent relationship between the two versions. In one case, the 
initial sets are not known but the algorithm can choose the elements for hitting set queries whereas in the 
online case, the sets are known but the adversary chooses the elements. Moreover, the number of queries is 
also a measure of performance in the version considered here. 

Our research is motivated by the problem of discovering the topologies of large networks such as the 
Internet. For large networks such as the Internet which changes frequently, it is very difficult and costly 
to obtain the topology accurately. Nevertheless, such information about the network is very useful - for 
example, the robustness properties of the network or studying the routing aspects. 

In order to create the topology of the network, one of the techniques used is to obtain local views of the 
network from various locations and combine them to determine the topology of the network. One can view 
this technique as an approach for discovering the topology of the network by some queries. Here, a query 
corresponds to the local view of the network from one specific location. In the real world scenario, the cost 
of answering a query is usually very high, so the objective of the network discovery problem is to find the 
map of the network using a minimal number of queries. 

Note that in the network discovery problem, we have to confirm the existence and non-existence of an 
edge between any pair of vertices. So, any query at a vertex should implicitly or explicitly confirm the 
absence or presence of edges between some pair of vertices. The Layered Graph Query Model and Distance 
Query Model are the most widely studied query models. 

Layered Graph Query Model: A query at a vertex v yields the set of all edges on shortest paths from the 
vertex v to any other vertex reachable from v in the graph. More specifically, we obtain information about 
an edge (x, y), iff d(v, x) and d(v, y) are consecutive where d(v, x) is the level of x (from v, see FigureQ]). 

Distance Query Model: A query at a vertex v yields the distances of v to every vertex of the graph, i.e. 
a query at a vertex v returns a vector v, where the ith component indicate the distance to ith vertex from 
vertex v. It is easy to see that it is a weaker query model as compared to Layered Graph Query Model. In the 
Distance Query Model, an edge may be discovered by a combination of queries as illustrated in Figure |2l In 
the example shown in Fig|2j query at vertex 1 discovers the non-edges {(1, 4), (1, 5), (1, 6), (2, 6), (3, 6)} 
and edges {(1, 2), (1, 3)}. A query at vertex 6 discovers the non-edges 

{(1, 4), (1, 5), (1, 6), (2, 6), (3, 6), (4, 2), (5, 2), (3, 2)} and edges {(3, 1), (1, 2), (6, 4), (6, 5)}. Combining 
these two queries, we discover the edges (5,3) and (4,3). In the off-line version of network discovery 
problem, the network is initially known to the algorithm. Unlike the online problem, here the goal is to 
compute a minimum number of queries that suffice to discover the network. Given a network, we can verify 
whether what we have been given is the correct information. Thus, we refer to the off-line version of network 
discovery problem as network verification. 

1.1 Prior work in network discovery 

Bejerano and Rastogi Q studied the problem of verifying all edges of a graph with as few queries as possible 
in a model similar to the Layered Graph Query Model. For a graph with n vertices, they give a set-cover 
based 0(log re)-approximation algorithm and show that the problem is NP-hard. In contrast to Bejerano 
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(c) 



Figure 1: A query at a vertex v\ in the layer graph model (a) yields certificate for the edges in (b) and 
non-edges in (c) 




Figure 2: The edges (5, 3) and (4, 3) of the graph (a) is discovered by the combination of queries at vertex 1 
in (b) and at vertex 6 in (c) in the distance query model - the distances are depicted via layers of the graph. 
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and Rastogi, we are interested in verifying (or discovering) both the edges and the non-edges of a graph. It 
turns out that the network verification problem was considered as a problem of placing landmarks in graphs 
|[T3l . The problem was shown to be NP-complete and an 0(log n)-approximation algorithm was presented. 
Beerliova et al. proved an (log re) lower bound on the approximation factor for any polynomial time 
algorithm for the network verification in the Layered Graph Query Modelunless P = NP. 

In the online version of the problem, the network (graph) is unknown to the algorithm. To decide the 
next query, the algorithm can only use the knowledge about the network it has gained from the answers of 
previously asked queries. Thus, the difficulty in selecting good queries arises from the fact that we only 
have the partial information about the network. 

For the network discovery problem, Beerliova et al.|4) have shown an Q(y/n) lower bound on the com- 
petitive ratio of any deterministic online algorithm and an O(logre) lower bound for any randomized al- 
gorithm for the Distance Query Model. The best known algorithm in the Distance Query Model is a ran- 
domized online algorithm which is 0(^n log re) -competitive H. In contrast, for the Layered Graph Query 
Model, Beerliova et al.iQ have shown that no deterministic online algorithm can be (3 — e) competitive for 
any e > 0. The best known algorithm in this model before this work is an 0(\/n log n) -competitive online 
randomized algorithm [4] that leaves an exponential gap between the best known lower and upper bounds 
for the Layered Graph Query Model. 

In this paper, we present a randomized Monte Carlo online algorithm with a competitive ratio 0(log 2 n) 
for the Layered Graph Query Model thereby nearly closing this exponential gap. 

1.2 Our results and techniques 

The network verification problem can be solved by reducing it to an appropriate instance of the set-cover 
problem (or hitting set problem). Hence, we obtain an O(logn) approximation algorithm for the network 
verification problem which is the best that we can hope to do unless P = NP. In the online network 
discovery problem, we do not know the graph a priori and hence the above reduction cannot be used directly. 
In particular, the sets are not known explicitly, so we first develop an algorithm for solving the covert version 
of the set-cover problem using queries. 

We present an algorithm that computes a set-cover of size at most 0(log(m' + re') • OPT) using at most 
0(log 2 (m' + ri) ■ OPT) queries with high probability. Using this, we obtain an O (log 2 re) -competitive 
Monte Carlo randomized algorithm for the network discovery problem in the Layered Graph Query Model. 
This is a significant improvement from the previously best known O ( \fn log n ) -competitive algorithm (ll3l). 

Our algorithm for the set-cover simulates the greedy set-cover algorithm without any information about 
the contents of any of the sets initially. We use estimation using random sampling to choose the (near) largest 
cardinality set which is the basis of the greedy algorithm. We have to compensate for the inaccuracies in 
sampling by using a more careful amortization argument for proving the approximation factor. The greedy 
algorithm is modified to run in 0(log(re' + m') rounds instead of the conventional OPT • log re' stages. 

2 Preliminaries 

Let G = (V, E) be a connected, undirected, unweighted graph representing a network of re vertices. For 
two distinct nodes u, v € V, we say that (u, v) is an edge if (u, v) G E and non-edges if (u, v) ^ E. The 
set of non-edges in G is denoted by E. 

We assume that the set V of nodes is known in advance and it is the presence or absence of edges that 
need to be discovered or verified. A query at node v is denoted by query(v). 
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We say that a query(v) certifies (u, v) if by using the answers to the query(v), one can confirm the 
presence or absence of the edge (u, v) in the graph, i.e. query(v) implicitly or explicitly confirms whether 
(u,v) G E or (u,v) G E. We associate two sets with each query{v) as follows. For a given vertex 
v G V, let Q„ denotes the set of all (u, i> ) G V x V such that query(v) certifies (it, t>) . For a given 
(u, v) G V x V, let Hr U)V \ denote the set of all vertices v such that query (v)certifies (u, v) . The two 
definitions can be considered duals of each other. 

Qv = {(u,v) G V x V\ query(v) certifies (u,v)} G V 

H(u,v) = {v &V\ query(v) certifies (u,v)}V(u,v) G V x V. 

The above formulation of the network discovery problem can be reduced to the set-cover problem in which 
given a collection of sets Q v of E U E, the goal is to find a (minimum size) subset V C V such that 
U^evQu = E U E. Therefore, querying the vertices of the set-cover will certify all the edges and non- 
edges that can be used to discover the network. 

In the related hitting-set problem, given a collection of sets Ht u , v ) °f V, the goal is to find a (minimum 
size) subset V' C V such that for any given set H^ u v y there exists a vertex v' G V such that v' G Hr u ^ v \. 
It may be noted that the (offline) hitting-set problem is often solved by reducing it to the corresponding 
set-cover problem. 

In the offline verification problem, given any query model, one can find the above sets exactly as the 
graph is known. So the network verification problem can be solved by reducing it to the corresponding set- 
cover problem (or hitting set problem). Hence, we get an O(logn) competitive algorithm for the network 
verification problem. As mentioned earlier this is the best that we can hope to do for this problem unless 
P = NP. 

In the online network discovery problem, since we do not know the graph a priori, we cannot compute 
the above sets explicitly without querying all the vertices H To circumvent this problem, we develop an 
algorithm for approximating the set-cover using the related hitting-set queries. It can be easily seen (c.f. 
Section 6), that that Hr u ^ can be obtained from Q u and Q v in the context of the network discovery problem. 

3 Approximating set-cover from e-net 

Clarkson lfT2l presented an elegant algorithm for set cover for geometric problems with bounded VC di- 
mension (see iflOl for a survey of such results) based on weighted e-net. His algorithm is based on random 
sampling (weighted) and a procedure to verify if a family of subsets is indeed a set cover. Notice that in 
our context |C| queries suffice to perform this verification where C is the claimed set cover. Intuitively, if 
an element is covered by fraction of the sets, then it will be covered by the e-net. For the remain- 
ing elements, the algorithms successively boosts the probability of being covered by a clever reweighting 
technique. The origins of this method goes back to Clarkson Q. 

If a set has weight w and the sum of weights of all the sets is W, then the set is sampled with probability 
^7. The algorithm repeatedly picks a random sample where in each new iteration, the weights are modified 
until we obtain a set cover. The algorithm assumes that the size of the optimal set cover OPT is known and 
fixes e = \ where k = \OPT\. The algorithm can be summarized as follows 

oik 1 1 

It is known that with high probability, the above algorithm converges in 0{k log(m/k) iterations ( lfT2l ). 
Since \OPT\ is not known initially, we can use the doubling technique to guess \OPT\ within a factor 

2 While this may be necessary for some graphs like the complete graphs, in general this will lead to poor competitive ratio. 
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Algorithm 1 Set cover using weighted e-net 

Initially assign every set a unit weight. Initialize setcover C = 4>. 
while C is not a cover do 

1: Pick a weighted e-net £ of size 0(1/ e ■ log m'). 

2: If £ is a set-cover then report E. 

3: Else, it misses at least one element, say x. Let S x be the family of all sets that contain the element x 
and double the weights of all sets in S x . 



2 by beginning with ko = 1 and ki + \ = 2ki as the z-th guess. Each iteration of the algorithm takes 
0(ki logm') queries, so the total number of queries is 0(kf log 2 m'). Note that it takes 0(ki) queries to 

verify a set cover. This yields a grand total of Y^i=o° PT ^ 0(2 2t log 2 m) = 0(\OPT\ 2 log 2 m) queries. 
This has competitive ratio roughly \OPT\ log 2 m, so that for \OPT\ < 0(yfm/), the competitive ratio is 
about \frnJ. For \OPT\ > \frnj j log 2 m' , the competitive ratio is clearly 0(y/m/) as m' queries trivially 
suffices. Therefore, by using this algorithm for network discovery in the layered graph model, we can match 
the algorithm of Beerliova et al. 0. 

4 A near-optimal algorithm 

In the conventional greedy set-cover algorithm, we choose a set s max that covers the maximum number of 
uncovered elements, say n max , and add it to the cover. This leads to a log n! approximation. Instead, if we 
choose any set that covers at least half of n max uncovered elements, then it gives a 2 log n! approximation. 
Recall that n' , ml denote the number of elements and the number of sets respectively. More generally, if we 
choose a set that cover at least \n max elements, then we obtain a c'logn' approximation. We consider a 
version of this Relaxed Greedy-Set-Cover (RGSC) where we repeat the following in stages 1,2,... logn'. 
At any stage we identify all the sets that contain at least \n max uncovered elements. We can consider the 
sets of in an arbitrary, but fixed ordering O and include those sets that contribute at least ^n max uncovered 
elements by deleting elements that have been already covered by sets chosen earlier. Note that the sets that 
will be included will depend on O - however, at the end of this stage, there will not be any set that contains 
n-max/2 or more uncovered elements. Since any such ordering O corresponds to a valid run of RGSC, this 
will yield a 2 log n! approximation guarantee - see Appendix for a formal proof. 

Our algorithm is based around simulating this approach, where we try to estimate the value of n max 
indirectly using random sampling. In round we check for n max € [^y, t^?] by choosing a random 
set of uncovered elements of an appropriate size. Using hitting set queries, we find the sets containing these 
randomly chosen elements. We choose an appropriate number of uncovered elements that will hit the sets 
having elements with high probability. We consider the sets in a fixed order and if a set contains more 
than at least a threshold number of randomly picked elements, then we include the set in the set-cover. 
Because of the estimation using random sampling, we lose a factor c' > 2 in the underlying RGSC as we 
may choose some sets which contain fewer than n max /2 uncovered elements (but at least n ™? x ). 

Algorithm Pseudo Greedy described below, selects all sets containing at least n max /2 uncovered el- 
ements and discards the sets containing less than ^jn max uncovered elements for 4 < d < 8 with high 
probability. 

3 the notation n ma x will refer to the maximum in the current round i. 
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We assume that the sets are numbered in some canonical order. In the specific application of the net- 
work discovery problem, this ordering is implicit ({«i, «2, • • • v n }, this induces a canonical ordering on the 
collection Q v of sets). In the general setting, we assume that such an ordering exits or it can be easily 
computed. 

In Algorithm |2j N denotes the cardinality of the ground set plus the number sets in the given family 
(N = n' + m'). In the case of Network Discovery problem, N = 0(\V\ 2 ). In round i, we try to identify 
the sets containing at least uncovered elements. 

Algorithm 2 Pseudo-Greedy 

Initialize set cover C = {}. 
for i = 0, 1 ... do 

1: Let rii be the number of elements left in this round and Si = min{^-, raj}. Choose a random sample R 1 
of size (4arii/si) log N. 

Comment: a is a constant whose value will be determined in the analysis. 
2: If Si < a log N then solve the hitting set problem directly using at most ra, hitting set queries and run 

the explicit greedy set-cover algorithm. 
3: Else (if Si > a log N), let S l be the sets that contain more than a log N sampled elements. 

If S l is empty, increment i and go to step 1. 
4: Process S l = {X±,X2, . . .} in some predefined order until all sets are exhausted. 

(i) Let Rj be the union of elements of R 1 that are contained in the sets chosen among X±, X2, . . . Xj. 

(ii) C = C U X j+ i if 

\X j+1 n{R l \Rj)\ >alogN 

(else discard Xj+i. 

(iii) Update Rj to Rj+i- using set queries. 

5: Update the elements covered by the sets chosen in this round using set queries. 



5 Analysis 

We begin with a rough intuition behind the previous algorithm. If the largest set has size n'/t then the 
minimum number of sets in any set cover is U(t). Therefore we can afford to query a sample of size 
approximately 0(t • polylogn') elements without blowing up the competitive ratio. In this context note 
that a uniform random sample of size 0(t • polylogra') will have #(polylogra') elements common with a 
set of size n'/t with high probability. However, if there are Q(t) sets of size 0(n'/t), we cannot afford to 
sample repeatedly for finding these sets. The above observations form the crux of the analysis that are now 
formalized. 

Lemma 5.1 In round i, in Step 3, the following holds with high probability 

(i) If a set T contains at least Si/2 elements then with high probability it will have at least a log N sampled 
elements. 

(ii) Any set T chosen in Step 3 will contain at least -^Si elements for 4 < d < 8 with high probability. 
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Proof. Let T be a set where m > \T\ > m/2. Suppose we sample every element independently with 
probability p. The expected number of sampled elements Y is such that mp > Y > mp/2. From Chernoff 
bounds, 

Pr[(l + e)mp > Y > (1 - e)mp/2] > 1 - 2e~ mpe2lA 
Choosing e = 1/2, we get 

Pr[3/2mp >Y> mp/4] > 1 - 2e" mp/16 

In round i, each element is picked independently with probability (4a/sj) log N, therefore, the expected 
number of hits in a set of size m is (Ama/si) log N. From Chernoff bounds, by substituting m = Sj, 

Pr[6alog N > Y > a log N] > 1 - 2e~ a/ilogN = 1 - 2/iV a / 4 

Since the number of such T is less than N, the algorithm picks all sets containing at least Sj/2 uncovered 
elements with high probability. On the other hand, T be any set chosen in Step 3 of the algorithm. Then, by 
applying Chernoff bound, we get, 

Pr[T < Si /c'] < e -(c'-2W81og7V = 1/JV (c'-2W8 

for all 4 < d < 8. ■ 

Lemma 5.2 If round i takes 0(^f • f(N)) queries, then the set-cover can be found using 0(n g ■ f(N)) 
queries where n g is the size of the set-cover returned by the underlying RGSC Algorithm. 

Proof. 

In round i, we include all those sets in the cover that covers at least Sj/2 additional elements. In round 
i, let us distribute the cost uniformly to the remaining elements, i.e., each of the rii elements is charged 
0(f(N)/si). If an element is covered by a set chosen in round i then it is not charged in the subsequent 

rounds. So the total cost over all the rounds for element x is C(x) < f(N)-(^ + 7" + 2F + 3]T + ---) — 

3d j^jj where s(x) is the set thatyzrsf covers element x and Si/d < \s(x)\ < Sj. The constant d refers to 
the constant in the previous lemma. Therefore 

The summation represents the cost of the underlying RGSC algorithm and therefore, it is bounded by 
3d f(N) ■ n g (see Lemma ITT1 in the Appendix). 

Note that the underlying RGSC algorithm is a d log N approximation to the set-cover. ■ 

Theorem 5.3 Algorithm^ returns a set-cover of size at most 0(logN ■ OPT) using at most 0(log 2 N ■ 
OPT) queries with high probability. 

Proof. In our algorithm, f(N) is 0(log N) and the maximum number of iterations is O(logiV). When 
Si < a log N , we solve the problem directly using at most nj hitting set queries, and explicitly run the 
greedy set-cover. Since the largest set has size n'/2\ the size of the optimal cover is at least 0(n'/ log N) 
and therefore, the number of queries is 0(logiV • OPT). In order to prove the theorem, we will show 
that the bound on n g in Lemma [5^21 is 0(log • OPT). So, we must establish that the sets shortlisted in 
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Step 3 of the Algorithm and finally included in the cover in Step 4 are only those sets (on the basis of their 
estimates) that covers at least Sj / d uncovered elements. In particular, we must guard against oversampling 
of the uncovered elements of any set at the time it is considered for inclusion in a given round. Even though 
the sets were appropriately sampled in Step 3, at the time of its consideration in Step 4(h), the sampling of 
the remaining part must be accurate enough that may necessitate arguing about an exponential number of 
possibilities depending on the order of its inclusion. 

To avoid this, let us assume that we consider the sets of S l in increasing order of their indices Let 
X\, X 2 ■ ■ ■ be the sets of S l in this canonical ordering that contain at least Si/c' elements. Now consider 
a hypothetical ordering O of the elements based on this ordering of the sets. Namely, all elements of 
are numbered smaller than Xi + \ and the numbering within a set is arbitrary. For example, all the elements 
in Xi \ X2 are numbered before X\ n X 2 and elements of X 2 \ X\ come last. Suppose the elements are 
sampled according to O. We define X- as all the uncovered elements in Xi after X\, X 2 , ■ ■ ■ Xi-i have been 
considered and (hypothetically) sample the elements in X- according to O. 

We consider X[ to be under-sampled if \X'^\ > s, but the number of sampled elements intersecting X[ 
(not including Xi \ X-) is less than a log n. We analogously define oversampling for X[. 

We say that a bad event has occurred in round j, if any of the sets X- is under-sampled or oversampled 
and let the complement of this event be Zj. From Lemma 15.11 we can bound the probability of under 
sampling and over sampling such that Pr[Zj] > 1 — S/A^"/ 4 ) -1 (by choosing c! > 4). Let Ai be the event 
that no under-sampling or oversampling occurs for X[, X' 2 . . . X[. Then, 

Pr[Ai] = Pi[Ai-x fl Zi] = PviZilAi-i] • Pr[i4<_i] 

Therefore, 

Pr[A;] > Pr[Ai_i] • (1 - 3/A^ 4 )- 1 ) > (l - 2/N a ^' i y for i < N 

By choosing sufficiently large a this is at least 1 — l/N 2 . Since this holds for all j < 0(log N) rounds, 
this also bounds the failure probability of our algorithm. ■ 

Remarks: (i) The bounds do not depend on O and holds for any parallel sampling method. 

(ii) We say that the algorithm fails if in any of rounds, it does not pick all sets containing at least Sj/2 
uncovered elements or picks any set containing less than Sj / d uncovered elements. The sizes of sets that 
will be chosen will satisfy the the above mentioned bounds with high probability; otherwise, the algorithm 
will be deemed to have failed. Note that the bound of Lemma l5T2l also holds with the same probability. Since 
we do not verify these properties, we obtain a Monte Carlo algorithm. 

(iii) A deterministic algorithm picks all the sets of size at least Sj/2, and while our randomized algorithm 
chooses all sets of size at least Sj/2, it may pick some sets which are little smaller (but greater than Si/d). 

6 Network Discovery 

The off-line problem of network verification can be reduced to a set-cover problem. In the online version, we 
do not want to compute the sets explicitly since this will lead to a poor competitive ratio in many situations. 
So we solve the problem by using hitting-set queries as described in the previous section that gives us an 
estimate of the set sizes. In our setting, the hitting-set problem is defined on the sets Hi u ,v) an d the set-cover 
problem on the sets Q v . During any stage, random sampling is done on the set of unresolved edges to obtain 
estimates of Q v by querying Q xy where (x, y) is a sampled edge. 
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Recall that in Layered Graph Query Model, a query at a vertex v yields the set of all edges on shortest 
paths between v and any other vertex. Now, we observe that this query model is equivalent to the model 
in which a query at vertex v yields all edges and non-edges between vertices of different distances from v. 
Note that an edge connects two vertices of different distance from v if and only if it lies on a shortest path 
between v and one of these two vertices. The shortest path rooted at v implicitly confirms the absence of all 
edges between vertices of different distance from v. So given an edge or non-edge whose status is not yet 
resolved, say (v,u), we query both the end points v and u to determine the distances of all nodes to u and 
v. From this we can deduce the set Ht u , v ) °f nodes from which the edge or non-edge between u and v can 
be discovered: Hr U)V ) = {x G V\d(u, x) / d(v, x) d(s, x) = distance from s to x} 

Algorithm Pseudo Greedy described in the previous section above translates to the following in the 
context of the network discovery problem. Randomly pick a undiscovered edge and query the set Hr UjV \. 
Let n be the number of vertices in the graph and let Q denote the query set- this is the (approximately 
minimal) set of vertices which will be used to discover the network. If v is contained in at least a log n of 
the queried sets, include v in the set-cover Q. Actually, like the general set-cover problem, it is a two stage 
process where we first shortlist and then subsequently run through this list in some predefined ordering, say 
according to the labels of the vertices. As before, we solve the set-cover problem on Q v using a sequence of 
Hr u v \ hitting set queries. The reader can easily work out the details that we omit to avoid repetition. 

In the following algorithm N = 0(n 2 ). The algorithm takes O(logn) stages and in each stage we 
make 0(log n ■ OPT ) queries, where OPT is the optimum number of queries required to solve the network 
verification problem. Since this is also optimum for the online problem, Algorithm Network Discovery 
makes 0(log 2 n-OPT ) queries. The algorithm yields a set O (log n- OPT ) Q v queries that suffices to discover 
the given network. Therefore the overall number of queries for the online discovery is still 0(log 2 n ■ OPT ). 

Algorithm 3 Network Discovery 
for i = 0, 1 . . . do 

1: Let rii be the number of edges and non-edges which needs to be discovered and Sj = min{^f-, m}. 

Choose a random sample of R l of size (Aarii/si) log N. 
2: If si < a log N then find Hi u ,v) f° r eacn of the undiscovered edge/non-edge and solve the network 

discovery problem by reducing it explicitly to the set-cover problem. 
3: If Si > a log N, for each sampled edge/non-edge (u, v), find the set H( u v y 

4: Consider the vertices {v\,V2, ■ ■ •} in this order and include Vj in Q (Q Vj is in the set-cover) only if 
Q v . contains more than a log N sampled edge/non-edge, (vj G Hr u v \ for at least a log N of the 

(u,v) e Ri). 

(The actual implementation of this is similar to Steps 3-4 of the Algorithm Pseudo Greedy.) 



From our earlier analysis of the covert set-cover problem it follows that 

Theorem 6.1 There is a 0(log 2 n)-competitive randomized Monte Carlo algorithm for the network discov- 
ery problem in the Layered Graph Query Model. 

Remark Even if we restrict the query model to return a Layered graph of some bounded depth (that may 
not correspond to the entire graph), the reduction to covert set cover problem is analogous by modifying the 
definition of the sets and we obtain the same competitive ratio. 
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7 Conclusion and open problem 



The algorithm described in the last section gave a (9(log 2 n) algorithm for the network discovery problem - 
Can we improve this to 0(log n) ? We can consider a weighted version of the network discovery problem, 
where each query at a vertex costs say w v , it is not clear whether we can extend our approach to solve the 
weighted version of the problem. 

We note that in the Distance Query Model, by querying both v and u, we can discover if u or v is 
a edge or non-edge. If it is a non-edge, then we can find the set Hi u ^ v \ - a vertex w is in this set if 
d(u, w) — d(v, w) > 2. But if (u, v) is an edge, then we can not find the set Hi u v y It is not clear how to 
determine the partial witnesses, using set-cover queries as before. Therefore, it remains open if we can we 
improve the known 0(y/n log n) bound for network discovery problem to O (poly (log n)) approximation 
randomized algorithm in the Distance Query Model? 

Acknowledgement The first author is thankful to Rajeev Raman and Thomas Erlebach for introducing him 
to the problem and subsequent technical discussions. 
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Appendix A 

Cheraoff bounds 

If a random variable X is the sum of n iid Bernoulli trials with a success probability of p in each trial, the 
following equations give us concentration bounds of deviation of X from the expected value of np. These 
are useful for small deviations from a large expected value. 

Prob(X < (1 - e)pn) < exp(-e 2 np/2) (1) 
Prob(X > (1 + e)np) < exp{-e 2 np/4) (2) 

for all < e < 1. 
Greedy set-cover 

For completeness, we also sketch the proof of approximation factor of RGSC(9) for 6 < 1, such that 
at any step, the size of the set chosen is at least ■ n max . 

Let us number the elements of S in the order they were covered by the greedy algorithm (wlog, we 
can renumber such that they are x±,X2 ■ ■ •)■ We will apportion the cost of covering an element e G S as 
w(e) = jj^y where e is covered for the first time by U and V is set of elements covered till then. This is 
also called the cost-effectiveness of set U. The total cost of the cover is 



E E 



1 



\n(U)\ 

where n(U) is subset of uncovered elements in U when U was chosen and e is covered for the first time. 
This can be rewritten as w(xi). 

Lemma 7.1 

I \ s- Co/0 



n — i + 1 

where C Q is the number of sets in the optimum cover. 

In the iteration when Xj is covered for the first time, the number of uncovered elements is > n—i+1. The 
pure greedy choice is more cost effective than any left over set of the optimal cover. Suppose , Si 2 . . . Si k 
are the unselected sets of the minimum set-cover. Then, at least one of them has a cost-effectiveness of 
< n-i+i — n-i+i ■ ^ f 01 l° ws tna,: the set chosen by RGSC(9) achieves a cost-effectiveness of ^J^-^g - 
So w( Xi ) 

Thus the cost of the greedy cover is n _"/ +1 which is bounded by C o /0 ■ H n . Here H n = ^ + + 
...1. 
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